The levels of measurement determines what mathematical (and statistical) operations you can perform
| Mathematical operation | Nominal | Ordinal | Interval | Ratio |
|---|---|---|---|---|
| equal, not equal | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
| greater or less than | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |
| add, subtract | \(\checkmark\) | \(\checkmark\) | ||
| multiply, divide | \(\checkmark\) | |||
| central tendency | mode | median | mean | mean |
Analysis of variance (ANOVA) and linear regression (OLS regression) are both special cases of the general linear model
ANOVA
Linear regression
Two equivalent ways to present the linear regression equation
\[\hat{Y_i} = b_0 + b_1 X_{1i} + b_2 X_{2i} + \dots + b_p X_{pi}\]
\[Y_i = b_0 + b_1 X_{1i} + b_2 X_{2i} + \dots + b_p X_{pi} + e_i\]
In GLM, we talk about \(\hat{Y}\), the expected or predicted value of \(Y\)
Specifically, we say that \(\eta\) is a function of the predictors (\(X\)s) and regression coefficients (\(b\)s)
Also called the “linear predictor”
Systematic component: \(\eta = b_0 + b_1X_1 + b_2X_2 + \cdots + b_pX_p\)
| ~ | Linear regression | GLiM |
|---|---|---|
| Estimation | Ordinary least squares (OLS) | Maximum likelihood (ML) |
| Missing data | Listwise deletion | Maximum likelihood (ML) |
| Tests | \(t\)-tests | \(z\) or \(\chi^2\)-tests* |
| Overall | \(R^2\) | Pseudo-\(R^2\) |
* For a normal outcome, R gives \(t\)-tests in GLiM procedure
Call:
lm(formula = y ~ x, data = data1)
Residuals:
Min 1Q Median 3Q Max
-55.087 -15.069 -0.278 15.475 65.242
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.5513 2.5881 3.304 0.00133 **
x 2.9727 0.4557 6.523 3.03e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.27 on 98 degrees of freedom
Multiple R-squared: 0.3028, Adjusted R-squared: 0.2957
F-statistic: 42.56 on 1 and 98 DF, p-value: 3.025e-09
Call:
glm(formula = y ~ x, family = gaussian(link = "identity"), data = data1)
Deviance Residuals:
Min 1Q Median 3Q Max
-55.087 -15.069 -0.278 15.475 65.242
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.5513 2.5881 3.304 0.00133 **
x 2.9727 0.4557 6.523 3.03e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 638.64)
Null deviance: 89764 on 99 degrees of freedom
Residual deviance: 62587 on 98 degrees of freedom
AIC: 933.7
Number of Fisher Scoring iterations: 2